Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backend Configuration Va] Basic user documentation #802

Merged
merged 131 commits into from
May 6, 2024

Conversation

CodyCBakerPhD
Copy link
Member

fix #797

@CodyCBakerPhD CodyCBakerPhD self-assigned this Apr 2, 2024

And likewise for ``AVAILABLE_ZARR_COMPRESSION_METHODS``.

We can confirm these values are saved by re-printing that particular dataset configuration...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We can confirm these values are saved by re-printing that particular dataset configuration...
We can confirm these values are saved by re-printing that particular dataset configuration:

It's not clear from this that these configurations made it back into the larger object. It's also not clear how you would use this configuration to modify an in-memory NWBFile object. It would also be great to write this file and then show that the written file followed the chunking and compression parameters we chose. This could be done using HDFView, or some other app or API

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear from this that these configurations made it back into the larger object.

This was obscured by the odd injection of all the notes; with those moved, I put the two printouts side by side and added some spacing

image

It would also be great to write this file and then show that the written file followed the chunking and compression parameters we chose.

The point the demo here makes is just that they get saved in the backend_configuration object itself - I can add something to prove that it actually writes the file in that manner (for chunks and compression anyway, can't prove buffering that way)

Though I will probably just demonstrate it with h5py since mentioning/showcasing HDFView could lead to annoyance because if a user wanted to try it they'd need to sign up for HDF5 - also Neurosift doesn't display this information, though I'm sure you could just request that - could also use PyNWB HTML repr after hdmf-dev/hdmf#1100 is merged, but this section is also currently for python script / ipython usage, not specific to a jupyter notebook

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a section to the FAQ for how to check that customizations (or the default for that matter) actually configure the item correctly in the resulting file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK great, could you please add a section on writing the data and confirming (somehow) that the data was written with the proper settings?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did, it's the last section of the FAQ. Uses the file written using the preceeding section

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK that section of the FAQ looks great, but I still feel like it would be nice to have another snippet here showing how to apply the configs to an in-memory NWBFile object and write to disk

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you mean here - what are you imagining and how is it different from the preceding section?


**How do I disable chunking and compression completely?**

To completely disable chunking (i.e., 'contiguous' layout), set both ``chunk_shape=None`` and ``compression_method=None``.
Copy link
Contributor

@bendichter bendichter May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To completely disable chunking (i.e., 'contiguous' layout), set both ``chunk_shape=None`` and ``compression_method=None``.
To completely disable chunking for HDF5 backends (i.e., 'contiguous' layout), set both ``chunk_shape=None`` and ``compression_method=None``. Zarr requires all datasets to be chunked.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not that important and this is what you meant but you can pass chunks=None to zarr and get all the data in one file:

import zarr

# Create a Zarr group
zarr_folder = zarr.open_group('test_zarr.zarr', mode='w')

# Create an array and add it to the Zarr group
array_data = np.random.rand(20, 20, 20)
zarr_array = zarr_folder.create_dataset('array', data=array_data, chunks=None)
zarr_array.info
Name/array
Typezarr.core.Array
Data typefloat64
Shape(20, 20, 20)
Chunk shape(20, 20, 20)
OrderC
Read-onlyFalse
CompressorBlosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
Store typezarr.storage.DirectoryStore
No. bytes64000 (62.5K)
No. bytes stored56455 (55.1K)
Storage ratio1.1
Chunks initialized1/1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not check myself (trusted @bendichter on that one)

Our config literally just passes the values on to downstream calls so I assume this would work here, in which case we can remove this line as a minor point (even though chunks=None is the option, and even though the array says 1/1 chunks, that's all 'equivalent' to contiguous layout in a sense)

@CodyCBakerPhD CodyCBakerPhD requested a review from bendichter May 6, 2024 17:02
@bendichter
Copy link
Contributor

I haven't had a chance to run through the code. If it all runs as advertised, this looks good to me

@CodyCBakerPhD
Copy link
Member Author

If it all runs as advertised, this looks good to me

I ran through it all again - fixed a couple minor things now that the upstream ecephys change is through - confirmed everything works as advertised

We should keep in the back of our minds how to ensure this into the future however - doctest doesn't really work well for this scenario since we broke up various steps for the sake of readability (and doctest needs the variables to be in the same code block), which is more of a style that works well with using a notebook as a doc base (as PyNWB tutorials do) but we've not done that yet on NeuroConv

@CodyCBakerPhD CodyCBakerPhD merged commit 247ba16 into main May 6, 2024
25 checks passed
@CodyCBakerPhD CodyCBakerPhD deleted the backend_config_docs branch May 6, 2024 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Documentation]: add docs for advanced backend support
3 participants